Genome-Wide Characterization of a Highly Penetrant Form of Hyperlipoprotein(a)emia Associated With Genetically Elevated Cardiovascular Risk

Background: Lp(a) (lipoprotein [a]) is a highly atherogenic lipoprotein strongly associated with coronary artery disease (CAD). Lp(a) concentrations are chiefly determined genetically. Investigation of large pedigrees with extreme Lp(a) using modern whole-genome approaches may unravel the genetic determinants underpinning this pathological phenotype. Methods: A large family characterized by high Lp(a) and increased CAD incidence was recruited by cascade screening. Plasma lipids, lipoproteins, and apolipoproteins concentrations, as well as the size of apo(a) isoforms, were determined enzymatically by high-resolution mass spectrometry and Western blot, respectively. Whole-exome sequencing was performed to search for rare defects in modifier genes. Genetic risk scores (GRS) for Lp(a) and CAD were calculated and their discriminative power was assessed. Results: Seventeen individuals displayed extreme Lp(a) levels including 6 with CAD. Whole-exome sequencing showed no hint for genetic defects outside the LPA locus. The extreme Lp(a) phenotype segregated with the presence of a short apo(a) isoform containing 21 Kringle IV domains. This allele was characterized by the presence of three rare strongly Lp(a) increasing single nucleotide polymorphisms and a significantly increased load of oxidized phospholipids per Lp(a) particle. An Lp(a) GRS consisting of 48 single nucleotide polymorphisms that represent 2001 genome-wide significant LPA single nucleotide polymorphisms, efficiently captured the hyper-Lp(a) phenotype and discriminated affected and nonaffected individuals with great accuracy. The genome-wide GRS for CAD, encompassing 6.6 million single nucleotide polymorphisms, was very high for most family members (>97.5 percentile of the reference population), but this observation was no longer valid when the contribution of the LPA locus was omitted. Conclusions: High-Lp(a) phenotypes can be successfully captured using the Lp(a) GRS even among closely related family members. In hyper-Lp(a) individuals, LPA can be a major locus driving a very high CAD GRS. This underpins the large contribution of the LPA locus to the cardiovascular genetic risk in families.

L p(a) (lipoprotein [a]) is a highly atherogenic lipoprotein causatively, independently, and significantly associated with cardiovascular diseases and calcified aortic valve stenosis. 1,2 The major structural difference between Lp(a) and LDL (low-density lipoproteins) is that Lp(a) contains a unique signature protein, apolipoprotein(a) (apo[a]) covalently linked to apo B 100 . 3 The atherogenicity of Lp(a) does not only stem from its LDL moiety rich in cholesterol but also because it is a sink for oxidized phospholipids (oxPL). 3,4 Apo(a) is the product of the LPA gene located on chromosome 6q26-27. 1 It presents a highly repetitive structure consisting of 10 subtypes of the plasminogenderived KIV (kringle IV) domains (KIV-1 to KIV-10), followed by one kringle V and one inactive protease domains. Two enhancer regions (named DH-II and DH-III [DNase Hypersensitive sites II and III]) have been identified ≈20 to 30 kb upstream of LPA, 5 and the promoter region extends for at least 4 kb upstream of LPA. 6,7 The KIV-2 domain is encoded by a pair of exons that can be repeated 1 to 40 times per allele. 8 The major consequence of this copy number variation is that the size of apo(a) is highly polymorphic, its molecular weight ranging from 300 to 800 kDa. Apo(a) isoforms size is inversely correlated with plasma Lp(a) concentrations, as it correlates with endoplasmic reticulum retention time, and explains 30% to 70% of Lp(a) variability. 1 Overall, the whole LPA locus explains up to 90% of Lp(a) variability, 9 indicating that additional strong modulators of Lp(a) concentrations reside within the LPA locus and account for the fact that Lp(a) levels can vary by 200-fold even among carriers of apo(a) isoforms of identical sizes. 10 In line with this, same-sized LPA alleles still differ in the haplotypes of the single nucleotide polymorphisms (SNPs) they carry. 11,12 For instance, SNPs rs10455872 and rs3798220 are largely used in the field, as they allow partial tagging of small apo(a) isoforms. 1,13 In addition, specific SNP haplotypes associate with Lp(a) concentrations that can be much lower or much higher than what would be expected from the sole size of the isoforms. 11,12 Some examples of such variants have been reported, 14-16 but many more probably exist. Genome-wide association studies (GWAS) have identified hundreds of variants associated with Lp(a) levels. 17,18 A majority of these variants are distributed over a ≈2 megabases region around the LPA locus, 17,18 but causality for modulating Lp(a) levels has been established only for a handful. The important contribution of additional SNPs to Lp(a) concentration is also illustrated by the fact that within families, same-sized apo(a) isoforms are associated with a much narrower Lp(a) variability (2-to 3-fold) than in the general population. 10 The occurrence of high LPA expressing alleles can thus confer a highly penetrant cardiovascular risk to individuals and families, [19][20][21][22] similar to what is seen in familial hypercholesterolemia. 23,24 Although LPA is the major determinant of Lp(a) in the population, it is unclear whether rare defects in other genes 25 can also be at the origin of some hyper-Lp(a) phenotypes. Investigations of pedigrees with extreme phenotypes using modern whole-genome technologies might help unravel the genetic determinants of hyperlipoprotein(a)emia. To test this possibility, we have undertaken a comprehensive investigation of a unique pedigree recruited through cascade screening from an individual with no cardiovascular risk factors other than an extreme Lp(a) concentration who underwent recurrent coronary syndromes, 20 using whole-exome sequencing, targeted analysis of the LPA locus, genetic risk score (GRS) computation for Lp(a) and coronary artery disease (CAD), as well as extensive biochemical assessment of their Lp(a).

METHODS
Detailed methods are available in the Supplemental Methods and Tables S1 through S5 in the Supplemental Material. Ethics approval was granted by the Comité de Protection des Personnes Sud Méditerranée (ID: 2020-A00196-33). Informed consent was obtained from all participants, and all studies were performed in accordance with the Declaration of Helsinki. All participants gave written informed consent for genetic testing and research. The data that support the findings of this study are available from the corresponding author upon reasonable request.

Unique Multigenerational Pedigree
Seventeen related family members and five spouses were recruited through the index patient (III-A3) by cascade screening (Figure 1, Table S1 in the Supplemental Material). The 17 relatives descend from unrelated grandparents from La Réunion Island. The grandfather (I-1), who was a heavy smoker, had myocardial infarction (MI) at age 60 years and died from lung cancer at 80 years. The grandmother (I-2) died at 82 from recurrent episodes of MI and stroke. Among their 5 children (3 males/2 females), 2 sons (II-B1 and II-D1) had MI at ages 52 and 50 years, respectively. One was and still is overweight (body mass index, 29.4 kg/m 2 ) and both smoked. One daughter (II-C2), also Among the 14 family members of the third generation, in addition to the index case patient (III-A3) who had recurrent MI episodes at age 32 years, 20 two of his male cousins had severe MI at ages 27 (III-C4) and 35 (III-B1), respectively. The first smoked occasionally and had slightly elevated total cholesterol (6.52 mmol/L) and triglycerides (2.20 mmol/L). The second was hypertensive (systolic blood pressure/diastolic blood pressure 150/90 mm Hg) and overweight (body mass index, 29.3 kg/m 2 ). Among the 16 family members who have not developed any cardiovascular event yet, 8 were overweight, 3 were hypertensive, and 1 smoked. Seven had elevated total cholesterol levels (>5 mmol/L). Not a single family member displayed impaired renal function or aortic valve stenosis. Strikingly, plasma Lp(a) concentrations were found above the threshold of 125 nmol/L (ranging from 156 to 775 nmol/L) in all but one (III-B3) relatives as well in one out of 5 spouses (II-C1).
Noteworthy, LDL-C (LDL-cholesterol) levels were on average undistinguishable between family members with or without Lp(a) above 125 nmol/L (3.03±0.73 versus  Table S1 in the Supplemental Material). Most individuals in this large pedigree, therefore, display hyperlipoprotein(a)emia without true elevated LDL-C levels.

Whole-Exome Sequencing
We first hypothesized that a yet unidentified putative or regulator defect might cause the extraordinarily high-Lp(a) concentrations observed in a majority of pedigree members. Although anecdotal reports about hyper-Lp(a) pedigrees are known in the field, a comprehensive genetic evaluation of such a pedigree has not been performed before. We thus performed whole-exome sequencing in 13 family members (9 with high Lp[a]) (the other pedigree members joined the study after completion of this analysis). This yielded 316 251 SNPs and 52 676 indels. Ninety-one thousand six hundred sixty-two SNPs and 6709 variants were retained after filtering for a minimum coverage of 4× and localization within ±50 base pairs from any exon annotated in National Center for Biotechnology Information Reference Sequence Database Release 105. Seventeen SNPs in 13 genes (8 missense variants, seven 3′ untranslated region SNPs, 2 5′ untranslated region SNPs) and no indels segregated exclusively with the high-Lp(a) phenotype (assuming a dominant mode of inheritance; Table S6 in the Supplemental Material). None of these genes except LPA has any known connection to lipid metabolism and/ or were plausible candidates (Supplemental Notes). Also, relaxation of filtering parameters by allowing for one individual being a phenocopy or up to 3 individuals being also homozygous for causal variants (in case the causal variant is unexpectedly frequent) as well as manual inspection of the candidate Lp(a) receptors recently reported 26 did not reveal additional candidate variants ( Table S7 in the Supplemental Material, Supplemental Notes). Taken together, whole-exome sequencing data found no clear hint for a receptor defect, which was also suggested by similar cellular uptake of Lp(a) observed in lymphocytes isolated from family members with high versus normal Lp(a) phenotypes (Supplemental Results and Figure S1 in the Supplemental Material).

LPA Gene Locus
Elevated Lp(a) levels were systematically associated with the presence of one apo(a) isoform containing 21 KIV domains in this family, except for 2 individuals: II-C1 who entered the pedigree by marriage and presents another isoform combination associated with high Lp(a) (15 KIV and 20 KIV) as well as his daughter III-C3 who inherited the 15 KIV allele (Figure 1). The high expressing 21 KIV allele was characterized by concomitant occurrence of the rare LPA SNPs rs3798220 (protease domain), rs186696265 (enhancer region; ≈26 kb upstream of LPA), and rs140570886 (KIV-6, intronic; Tables S6,  S8, and S9 in the Supplemental Material). Among carriers of that allele, the expression of the 21 KIV isoform was predominant, accounting on average for 86±14% of total Lp(a) ( Table S9 in the Supplemental Material). The rare variants rs186696265 and rs140570886 were the strongest Lp(a) increasing SNPs in a recent GWAS. 17 The LPA allele with 21 KIV of individual II-A2 was isolated by long-range pulsed-field gel electrophoresis and the LPA enhancer region was subjected to Sanger sequencing. This confirmed that rs186696265 is located on the 21 KIV allele ( Table S8 in the Supplemental Material). Phased genotypes from imputation indicate that the minor alleles of rs3798220 and rs140570886 are on the same chromosome as the minor allele of rs186696265. This is in accordance with the observed SNP segregation patterns (Table S9 in the Supplemental Material). In the general population, these SNPs are in only partial linkage disequilibrium and have been independently linked to considerably increased Lp(a) concentrations (+43 to +64 mg/dL) 14,17 and, in case of rs3798220, also increased OxPL load. 27 All 3 SNPs were also significantly associated with increased LPA mRNA expression in liver in The Genotype-Tissue Expression Project (rs3798220: P=4.7×10 -8 ; rs186696265: P=0.00073; rs140570886: P=1.2×10 -7 ). Additionally, we observed at least one G allele of rs9347440 (minor allele frequency: 43.6% in Europeans; 59.4% in South Asians) in all hyper-Lp(a) individuals (Tables S8 and S9 in the Supplemental Material). This SNP has been previously linked to +250% in enhancer activity and +70% Lp(a) production. 28 Its correlation with GWAS hits has not been reported. Other previously reported enhancer SNPs were inconclusive 28 (Tables S8 and S9 in the Supplemental Material). No other variants segregating with the allele 21 were observed in a ≈5 kb around the enhancer regions DH-II and DH-III 5 (except the frequent variant rs59872631, minor allele frequency=0.278) and in the >4 kb promoter region. 6,7 Previously reported functional LPA SNPs are shown in Table S9 in the Supplemental Material. Of note, no role for rs10455872 was found, as this SNP was present only in two individuals carrying the 20 KIV allele that entered the pedigree by marriage (spouse II-C1 and his daughter III-C2; Table S9 in the Supplemental Material). Taken together, these results indicate that high-Lp(a) levels observed in this family are caused by a single high expressing LPA allele with 21 KIV characterized by the concomitant presence of multiple Lp(a)-increasing SNPs (rs3798220, rs186696265, rs140570886).

Oxidized Phospholipids
LPA rs3798220 (Ile4399Met) has been associated with elevated oxPL in apoB containing lipoproteins. 27 We thus measured the content of oxPL specifically associated with Lp(a) in all family members (Table S1 in

GRS for Lp(a)
Principal component analysis using whole-genome microarray data indicated that most pedigree members clustered close to European, as well as Middle Eastern and South Asian groups ( Figure S2

Impact on the GRS for CAD
Although most family members show increased Lp(a), not all have experienced premature CAD. To assess the polygenic CAD risk background and the impact of the LPA locus on it, we computed the polygenic CAD GRS recently published by Khera et al 29 for the pedigree and all reference groups. The CAD scores showed very similar distribution in 1000G Europeans, 1000G South Asians, and KORA F4. All CAD cases but one (II-C1) showed a CAD score above the 97.5th percentile of KORA F4 (Figure 3). Similar observations were made using the 1000G Europeans and South Asians reference groups ( Figure  S4 in the Supplemental Material).
Most notably, despite the fact that the genome-wide GRS for CAD encompasses 6.6 M SNPs, removal of the broader sense LPA locus (defined as the ≈1.76 megabases  Figure 3). Finally, we also hypothesized that a higher propensity to thrombotic events might enhance the effects of Lp(a) on CAD and used a GRS for venous thromboembolism (VTE) as a proxy for a putative thrombophilic genetic background in this family. No increased genetic risk for venous thromboembolism was seen in this family using the GRS of Klarin et al 30 ( Figure S5 in the Supplemental Material), even if some family members carry the prothrombin variant G20210A previously reported for the index patient 20 (Table S10 in the Supplemental Material). Taken together, these results demonstrate the major contribution of the LPA locus to the elevated genetic risk of CAD in this family.

DISCUSSION
The Lp(a) trait is mostly controlled by the complex LPA gene locus, 17,18 but the metabolic pathways governing Lp(a) plasma concentrations remain elusive. 1 Neither biochemical nor large GWAS studies have conclusively identified a catabolic receptor nor other genes with a major impact on Lp(a) concentrations. 17,18,26 Despite this nearly monogenic regulation, most Lp(a) epidemiology currently focuses on population studies whereas family studies using whole-genome approaches have not been pursued. Indeed, although LPA has been clearly established as the primary locus regulating Lp(a) concentrations in the population, it is unclear whether also other rare gene defects exist that may cause hyper-Lp(a). We, therefore, performed a comprehensive genetic characterization of a unique pedigree characterized by high Lp(a) and increased CAD incidence. The hyper-Lp(a) phenotype in the present pedigree was found to segregate with a strongly expressed LPA allele with 21 KIV (isoform 21) with no obvious contribution of coding variation in other genes (Supplemental Notes).
The isoform 21 that segregated with the phenotype was characterized by presence of three LPA SNPs rs186696265, rs140570886, and rs3798220. Rs186696265 is located close to both known enhancer regions upstream of LPA 5 and is the SNP with the strongest effect in the GWAS of Mack et al 17 (+64.7 mg/dL and +49.1 mg/dL in base and isoform-adjusted model; +47.6 mg/dL and +24.8 mg/dL in respective joint models with all other independent GWAS hits). Rs140570886, located in the intron of KIV-6, was the second strongest April 2022 154 SNP in the same study after adjusting the GWAS for apo(a) isoforms measured by Western blot to detect SNPs that modify Lp(a) beyond the isoform effect (single SNP model: +43.3 mg/dL; joint model: +23.8 mg/dL). Both SNPs contributed independently to Lp(a) concentrations even if included in the same regression model, 17 indicating an at least partially additive effect. The third SNP rs3798220 was associated with a high expressing apo(a) short isoform in an Austrian family. 31 The observation in Arai et al 27 and in the present work that this missense variant is associated with an increased oxPL load per Lp(a) particle is probably the mechanism by which this variant might contribute to increased atherogenicity. All 3 SNP have been reported by several GWAS studies on Lp(a), dyslipidemias, CAD risk, and related phenotypes. 32 An overly strongly expressed 21 KIV allele of the LPA locus, carrying a high load of Lp(a)-increasing variants might thus suffice as primary cause for the hyper-Lp(a) phenotype in this family. We used a GRS to additionally quantify the cumulative contribution of the genetic variability at the LPA locus (≈2000 SNPs captured via linkage disequilibrium) to the hyper-Lp(a) phenotype in this family. All but one individual with high Lp(a) showed an Lp(a) score close to or above the top 5th percentile of the reference populations. The hyper-Lp(a) individuals could thus be efficiently discriminated from their relatives with normal Lp(a) using an Lp(a) GRS. Interestingly, the only individual with high Lp(a) but a low Lp(a) GRS was an individual who had not inherited isoform 21, but isoform 15. Despite being shorter, isoform 15 was associated with somewhat lower Lp(a) than isoform 21, supporting the notion that genetic variants modify Lp(a) concentrations beyond what is determined by the sole size of the isoforms. 11,12,15 Our observations are in line with recent reports from the UK Biobank, where the Lp(a) GRS resembled closely the directly measured Lp(a) values 33 and offered comparable improvement in risk prediction as directly measured Lp(a). 34 In datasets with genetic information but not directly measured Lp(a), the Lp(a) GRS might thus be a valid surrogate for Lp(a) plasma levels, as the effect of the GRS on cardiovascular risk appears fully mediated by its effect on Lp(a) concentrations. [34][35][36][37] However, these studies were done in a large population-based study and it is unclear how well an Lp(a) GRS might be discriminative between closely related individuals. Our data shows that an Lp(a) GRS is discriminative even within families, capturing at least the most highly expressing alleles like the present 21 KIV isoform.
The pedigree was also characterized by a high incidence of CAD at relatively young age. Hypothesizing that the role of Lp(a) in determining the CAD risk may be further modified by an unfavorable polygenic background, we quantified the genome-wide polygenic contribution to CAD risk using a recently published genomic CAD GRS. 29 All CAD cases but one showed CAD GRS above the 97.5th percentile of the score distribution in KORA F4. Most notably, however, exclusion of the LPA region from the score computation significantly lowered the CAD risk in these individuals, putting mostly all of them below the 95th percentile. Given that the LPA locus chiefly determines Lp(a) plasma levels, this observation indirectly establishes that Lp(a) concentrations are a driving factor for CAD in this family. Considering that the CAD GRS contains 6.6 M SNPs, this is noteworthy and underscores the large contribution of the LPA locus to the genetic risk in this pedigree. A similar observation in UK Biobank has been posted recently on medRxiv, 38 supporting that in high-Lp(a) individuals the CAD GRS is indeed strongly determined by the LPA locus. Accordingly, an additive and even partially multiplicative effect of high Lp(a) and family history of CAD was recently reported. 39 Conversely, we did not observe any increase in VTE GRS in this pedigree. This may appear surprising, given the assumed prothrombotic effects of Lp(a) but recent large Mendelian randomization studies about Lp(a) and VTE have also been negative. [40][41][42] Only one study has reported an association between very high Lp(a) and VTE, 43 but this effect might not be properly captured using a VTE GRS. Although Lp(a) might not play a substantial role in systemic thrombosis, local prothrombotic effects at the site of atherosclerotic lesions are conceivable.
GRS may be rapidly approaching applications in the clinics, and a high GRS for CAD found in any given person or family will lead to the question "Which gene loci are primarily driving this risk?" In individuals with very high-Lp(a) plasma concentrations, it will also be relevant to determine which other genetic factors are contributing to their CAD GRS. The sharp reduction in the CAD GRS after exclusion of the LPA gene region observed in this pedigree pinpoints the LPA locus as the major cause in this pedigree and provides a strong rationale to use Lp(a)-lowering therapies currently into development that specifically target LPA mRNA and thereby reduce Lp(a) plasma levels. 44

Limitations
We acknowledge that our investigation focused on a single yet large pedigree. Our approach can, however, be generalized in cohorts including either many pedigrees or a large number of unrelated hyper-Lp(a) individuals and matching controls. Sequencing studies in such cohorts have the potential to provide considerably larger datasets than single family-based studies. The present work may guide such endeavors. We also acknowledge that the selection of appropriate reference populations for genetic studies is inherently difficult, given the diverse ethnic background of La at least of the LPA locus is supported by the observation that rs3798220 segregated with a short apo(a) isoform, an association seen in Europeans but not in South Asians. 45 This variant is not found at all in Africans. 45 Likewise, rs140570886 and rs186696265 are 2.5× to 5× rarer in South Asians than in Europeans and absent in Africans. We thus consider that our reference populations were appropriate, even if more ethnically diverse reference groups would have been ideal. Finally, our study assumes a causal SNP that is segregating within the pedigree in a Mendelian fashion. We are aware that we would not have sufficient power to detect allelic heterogeneity, that is, different mutations at the same locus causing the same phenotype, albeit this seems a rather unlikely possibility in the present pedigree.

Conclusions
Although some case reports about hyper-Lp(a) individuals and pedigrees have been published, none displays a thorough genetic characterization with whole-genome and whole-exome technologies. We here demonstrate that the Lp(a) GRS can successfully capture a hyper-Lp(a) phenotype also within pedigrees, despite the considerably higher relatedness they present compared with the general population. We also demonstrate in high-Lp(a) individuals that the CAD GRS can be strongly determined by the LPA locus. Although direct Lp(a) quantification is the preferred measure, in a future with individual genomic data being broadly available, routine determination of Lp(a) GRS may provide an actionable screening tool for cardiovascular risk prediction both in pedigrees and population.  These agencies had no role in the design and conduct of the study, in the collection, analysis, and interpretation of the data, and in the preparation, review, or approval of the article.

Disclosures
Dr Lambert reports research grants and personal fees from Nyrada Ltd. These industries had no role in the design and conduct of the study, in the collection, analysis, and interpretation of the data, and in the preparation, review, or approval of the article. The other authors report no conflicts.